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ABSTRACT 

Particle simulations, in which collective phenom- 
ena in plasmas are studied by following the self- 
consistent motions of many discrete particles, in- 
volve several highly repetitive sets of calculations 
that are readily adaptable to SIMD parallel pro- 
cessing. We describe a fully electromagnetic, rela- 
tivistic plasma simulation for the MPP. The particle 
motions are followed in 2 % dimensions (two spatial 
and three velocity) on a 128 x 128 grid, with pe- 
riodic boundary conditions. The two-dimensional 
simulation space is mapped directly onto the pro- 
cessor network; a Fast Fourier Transform is used to 
solve the field equations. Particle data are stored ac- 
cording to an Eulerian scheme, i.e., the information 
associated with each particle is moved from one lo- 
cal memory to another as the particle moves across 
the spatial grid. 

The method is applied to the study of the non- 
linear development of the whistler instability in a 
magnetospheric plasma model, with an anisotropic 
electron temperature. The wave distribution func- 
tion is included as a new diagnostic to allow simu- 
lation results to be compared with satellite observa- 
tions. Since the physics of self-gravitating systems 
is quite similar to plasma physics, incorporation of 
free-space boundary conditions and alteration of the 
field equations enable our code to be used for the 
study of density waves in galaxies. 

Keywords: Particle Simulation, Plasma Physics, 
Stellar Dynamics, Parallel Processing. 

1. INTRODUCTION 

The research work described in this paper belongs 

to a relatively new category: the exploratory use of 
parallel processors for the particle simulation of rar- 

efied media. By a ‘rarefied’ medium we mean one in 

which the collisional mean free paths are not neces- 
sarily small on the scale of the systems considered, 
with the result that it cannot always be approxi- 

mated satisfactorily as a continuous fluid. Plasma, 


which is the main constituent of the Universe, is a 
prime example. Among the very diverse phenomena 
that can occur in a plasma, some of the most inter- 
esting are highly nonlinear, and therefore difficult 
to analyze theoretically. For the plasma theorist, 
a powerful alternative and complement to analytic 
study is the use of particle simulation in a computer. 

‘Particle simulation’ is the generic term for com- 
putational procedures in which a medium is repre- 
sented in the computer as an assembly of discrete 
interacting particles. In a plasma the particles are 
ions and electrons, interacting through the electric 
and magnetic fields that they themselves create. At 
each time step in a simulation, the con ter has 
two distinct tasks to perform: it must update the 
positions and velocities of the particles, taking ac- 
count of their accelerations due to the fields, and 
it must update the electric and magnetic fields, the 
sources of which are, respectively, the charge den- 
sity calculated from the positions of the particles, 
and the current density calculated from both their 
positions and their velocities. 

Now, even a modest volume of plasma may con- 
tain a very large number of particles: in Earth’s 
ionosphere, for instance, there are typically 10 1 * * * * * 
electron-ion pairs per cubic metre. Computer 
simulations necessarily involve much smaller num- 
bers, not more than a few times 10 6 for present- 
day single-processor computers (‘uni-processors’), 
so each of the simulation particles actually stands 
for very many particles in the real world; they are 
sometimes called ‘superparticles* for this reason. 

Besides the reduction in the number of particles, 
other simplifications are often made in order to re- 
duce the demand for computing time. The common- 
est is to reduce the dimensionality, by considering 
systems in which all physical quantities vary in only 
one or two dimensions. This simplification can be 
very helpful, provided that it is authorized by the 
symmetry of the problem, but otherwise it is objec- 
tionable because it makes the simulation unrealistic. 
Another common way of simplifying plasma simu- 
lations is to ignore electromagnetic radiation by as- 
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suming that the speed of light is infinite. However, 
the real world is three-dimensional and the speed 
of light is finite, so most of the outstanding prob- 
lems in plasma physics that are amenable to parti- 
cle simulation will ultimately need to be tackled in 
three dimensions (3-D) using a fully electromagnetic 
(EM) code. 

Probably the most advanced EM particle code in 
existence is the TRI-dimensional STANford code 
TRISTAN, written in assembler language for the 
Cray-1 computer; see Ref. 1 for an account of an 
earlier version of this code. It follows the motion of 
about 5 X 10 6 particles in a cubical volume divided 
up into 128 3 cells, i.e., each side of the cube is di- 
vided into 128 units. These numbers are not extrav- 
agant: if anything, they err on the side of modesty, 
and codes with more particles and cells are likely to 
be required in the future. Already, however, the pre- 
liminary tests of TRISTAN have revealed a severe 
problem of computer usage. Though the assembler- 
language code has been carefully optimized, a single 
time step requires 2-3 minutes of CPU time on the 
Cray-1, and a typical simulation requires 500-1000 
steps. Difficulty in obtaining the requisite amount 
of computer time has already set back several re- 
search programs where the use of TRISTAN was 
envisaged. This is just one instance of a critical sit- 
uation which is now widely recognized, namely that 
advanced particle simulation is up against a barrier 
due to the speed limitations of uni-processors. 

Other advanced approaches to 3-D plasma simula- 
tion exist, namely the statistical methods involving 
numerical integration of the collisionless Boltzmann 
equation or of the Fokker-Planck equation. For- 
mally, they are equivalent to following the motion of 
a fluid in a 6-D space, which has three dimensions 
of velocity as well as the three dimensions of posi- 
tion. For a given number of cells in position space, 
however, these methods demand even greater com- 
puting speed. 

The prospects for large increases in the speed of uni- 
processors are not encouraging: fundamental physi- 
cal constraints on VLSI technology are expected to 
limit them to factors of less than 100. For larger in- 
creases, we must look to multi-processors, i.e., com- 
puters consisting of multiple processors arranged in 
parallel architectures, such as the MPP. 

Though the MPP was designed originally for pro- 
cessing image data from the Landsat satellites, its 
architecture, involving a large number of simple pro- 


cessors with nearest-neighbor connections, is well 
suited to the particle simulation of rarefied media, 
and incidentally to fluid simulation as well (Ref. 2). 
The motion of particles or fluid in a given spatial 
cell is determined only by conditions prevailing in- 
side that cell, and at its boundaries with neighbor- 
ing cells. (At least, this is the case so long as we 
refrain from making the approximation in which dis- 
turbances propagate across the array of cells instan- 
taneously). Hence problems concerning such motion 
can be mapped readily onto simple arrays of proces- 
sors in which direct connections exist only between 
nearest neighbors. 

In the present program of research, our initial aims 
were to gain enough experience in the use of the 
MPP to be able to answer the following questions: 

• Which of our plasma simulation problems can be 
solved on this type of multi-processor? 

• Is there a significant gain in speed, compared with 
plasma simulations on a uni-processor? 

• Could the capability of the MPP for plasma sim- 
ulation be improved in any simple way? 

• Would any other type of multi-processor be better 
suited to our problems, and if so which? 

In sum, we wished to investigate the potentialities 
and limitations of massively parallel processors for 
plasma particle simulation. 

Shortly after the investigation began, however, its 
scope was extended to include particle simulation 
of problems in stellar dynamics, this in collabora- 
tion with Dr. Bruce Smith of NASA Ames Research 
Center. Stellar dynamics is very similar to plasma 
electrodynamics in respect of its basic physics. Both 
plasmas and stellar systems are examples of rarefied 
media, and problems concerning either of them are 
cases of the classical N-body problem with inverse 
square law interactions. Each, of course, also has its 
specificity: stellar dynamics involves long-range in- 
teractions, from which plasmas are exempt because 
of Debye shielding; plasma electrodynamics involves 
magnetic fields, which have no counterpart in non- 
relativistic stellar dynamics. Nevertheless, the two 
fields have much in common, so naturally there are 
close similarities between the methods of particle 
simulation that have been developed for each of 
them. For the same reason, many possibilities exist 
for cross-fertilization between them, and for joint 
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efforts to solve common difficulties. These consid- 
erations led us to extend the scope of the program 
in this way. 

The present paper is an account of the work per- 
formed during the first year. Section 2 outlines 
the two physical problems, one in plasma electro- 
dynamics and the other in stellar dynamics, that 
have been chosen for simulation, and how we are ap- 
proaching them; for simplicity, both simulations are 
in two spatial dimensions instead of three. Section 
3 describes the numerical techniques used for the 
plasma simulation, and Section 4 the modifications 
required for the gravitational simulation. Section 5 
indicates how these various algorithms are being im- 
plemented on the MPP, while the current status of 
the work is described in Section 6. Finally, in Sec- 
tion 7, we draw some provisional conclusions and 
sketch our plans for the future, in both the short 
and long terms. 


2. PHYSICS PROBLEMS 

Recapitulating, our immediate objectives are to pro- 
gram the MPP to solve, by particle simulation, a 
significant problem in each of the two fields men- 
tioned above, namely stellar dynamics and plasma 
electrodynamics. 

The simpler of the two problems is the one in plasma 
electrodynamics, which concerns the nonlinear de- 
velopment of the Doppler-shifted electron gyroreso- 
nance instability, or ‘whistler instability* for short. 
It is a very suitable problem for solution on the 
MPP, for several reasons. Firstly, it is one in which 
only the electrons are involved, while the ions can 
be treated as an inert neutralizing background. The 
simulation algorithms are simpler if only one type 
type of particle needs to be represented. Secondly, 
the whistler instability is of the velocity-space va- 
riety, so it can occur even in a spatially uniform 
plasma. By assuming uniformity as an initial con- 
dition, we should succeed in sharing the comput- 
ing load evenly between the different processors. 
Finally, it is an electromagnetic instability, which 
means that field disturbances originating in any one 
spatial cell propagate to the other cells at finite ve- 
locities. Such propagation can be modeled read- 
ily on the MPP, in which each processor is con- 
nected only to its four nearest neighbors. Despite 
its apparent simplicity, the nonlinear evolution of 
the whistler instability has not been simulated be- 
fore in the conditions in which we intend to do so. 


This problem has applications both to space and to 
fusion plasma physics. 

To the best of our knowledge, all previous simu- 
lations of the whistler instability have been one- 
dimensional. The most recent, by Bharuthram and 
Baboolal (Ref. 3), were in several other respects 
quite similar to those that we are proposing. Thus, 
for instance, these authors used a fully electromag- 
netic code, gave the electrons a bi-Maxwellian ve- 
locity distribution at the outset, and followed the 
growth of the instability into the nonlinear regime. 
They obtained a variety of interesting scientific re- 
sults: for instance, the wave mode that, during 
the initial linear phase of the instability, grew most 
rapidly, died out in the post-saturation phase, and 
ultimately was replaced by a slower- growing mode 
which dominated the wave spectrum at the end of 
the simulation. 

Using our MPP code, we intend to study the nonlin- 
ear evolution of the whistler instability under much 
the same conditions, except that our simulations 
will be performed in 2|-D (i.e., in two spatial dimen- 
sions, but with all three components of velocity). 
We expect that the increase in dimensionality will 
lead to qualitatively new scientific results: in par- 
ticular, we anticipate that waves will be generated 
over a broad frequency band, with their wave nor- 
mal directions spread widely around the direction 
of the steady magnetic field. We shall study how 
the characteristics of the wave field in the nonlinear 
regime depend on the initial plasma conditions. 

For these simulations, the code will have to be en- 
hanced with sophisticated diagnostics, especially for 
the fields. The fact that it already includes a two- 
dimensional FFT should be very helpful in this re- 
spect, enabling wave power spectra, dispersion re- 
lations, temporal autocorrelation functions, etc., to 
be derived by existing techniques, developed previ- 
ously for serial processors. 

Additionally, we propose to employ a new diagnostic 
procedure, hitherto used only for analyzing whistler- 
mode wave data from satellites in the Earth’s mag- 
netosphere. This is the so-called ‘wave distribution 
function’ (WDF), which specifies how the wave en- 
ergy density is distributed with respect to frequency 
and to wave normal direction (Refs. 4-7). The ready 
availability of this diagnostic will, in the future, fa- 
cilitate comparisons between the results from nu- 
merical simulations of the whistler instability, and 
observations of natural whistler-mode wave fields 
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generated by the same instability in space. 

In stellar dynamics, we shall investigate the stability 
of an axisymmetric system (galaxy or protoplane- 
tary nebula), albeit that problems of this type have 
been simulated previously on conventional comput- 
ers. The interest of repeating one of these simula- 
tions on the MPP is that it will introduce some of 
the difficulties that are bypassed in the whistler in- 
stability problem. Although, again, the dynamics 
involve only one type of particle, the absence of a 
neutralizing background of particles of another type 
means that the inter-particle forces are not shielded 
in any way, so they are truly long-range. For this 
reason, all physically significant systems are spa- 
tially non-uniform. Finally, the times required for 
gravity to propagate across such systems are very 
small compared with the time scales characteriz- 
ing their dynamics (e.g., the rotation period of a 
galaxy), so little accuracy is lost and much comput- 
ing time saved by assuming that the propagation 
is instantaneous. Thus, besides the physical inter- 
est of this problem, its simulation on the MPP will 
have interesting computer science aspects. 

3. NUMERICAL TECHNIQUES 

Details of the particle-mesh method may be found in 
the standard texts on plasma simulation (e.g., Refs. 
8-10). We summarize here the techniques chosen 
for the MPP code. 

The purpose of the simulation model is to follow the 
self-consistent dynamics of fields and particles for 
several thousand time steps. Fields, densities and 
other spatial functions are stored as values at points 
on a grid. Initially, our simulation is restricted to 
two spatial dimensions for simplicity. The motion 
of a large number of particles (of the order of 10 5 
to 10 6 ), each representing many particles in the real 
world, is followed through successive time steps. 

Initially, particles are arranged in positions (x, y) 
that result in the required plasma configuration. For 
our present purposes, uniform density is specified. 
Particles are assigned random velocities (u x ,t? y ,t;*) 
according to a Gaussian distribution with suitable 
thermal and drift velocities. 

As is usual in numerical simulations, point particles 
are replaced by clouds of charge to prevent impulse- 
like interactions which give rise to collisional effects 
(Ref. 11). Long-range particle potentials are re- 
tained in the Coulomb form to model collective ef- 


fects, but a shape function 

5 ( r ) = ^2 exp(-r 2 /2o 2 ) (3.1) 

is introduced so as to ‘soften’ the particle potential 
within the range r < a (usually of the order of a 
Debye length). 

In order to calculate the fields set up by the charges, 
it is necessary to obtain the charge and current den- 
sities p and J on the spatial grid. A simple al- 
gorithm is used which produces the same results 
as the Subtracted Dipole Scheme (Ref. 12). The 
charge density array is compiled from the superpo- 
sition of a unit charge at each particle’s nearest grid 
point (NGP), plus small contributions at its four 
neighboring grid points which reproduce the dipole 
moment of the charge relative to its NGP. 

The electric and magnetic fields (E and B) are gov- 
erned by Maxwell’s equations. With the inclusion 
of the shape factor, Ampere’s and Faraday’s laws 
become 


^ = c [V x B(r) — 4 tt J(r) * ■S'(r)] (3.2) 

and 

® = -<VxE(r), (3.3) 

where the asterisk denotes convolution and c is the 
speed of light. Gauss’ law, 

V • E(r) = 47rp(r) * 5(r) (3.4) 

is satisfied by the initial conditions and will continue 
to hold if the continuity equation is satisfied. How- 
ever, microscopic inconsistencies between p and J 
arise due to the use of the mesh (Ref. 13). To pre- 
vent the resulting growth of noise in the fields, it is 
common to split the electric field into longitudinal 
and transverse components Ej and E t (Ref. 9), us- 
ing Gauss’ law to obtain E i and Faraday’s law to 
deduce E t . It is thus necesary to solve both ellip- 
tic and hyperbolic equations in this system; the use 
of Fourier or Hartley transforms provides a simple 
solution, as well as helpful diagnostics. 
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The field equations are integrated in time using a 
leap-frog method with time step At. Fourier trans- 
forming (3.2), (3.3) and (3.4) and including the time 
differencing gives 


E” (k) = Er 1 (k)+cAf{tkxB n_ 2(k)- 
4?r S (k) [ J n_ 2 (k) - (J""2(k) k/A: 2 )k]} (3.5) 

B n+ 2(k) = B n "2(k)- t cAtkxE"(k) (3.6) 

EP(k) = — 4jrp n (k) 5 (k) k/k 2 (3.7) 

The superscripts denote the time step at which the 
field is known: 


f n = f{t 0 +nAt) (3.8) 

The stability of the method is governed by a 
Courant-Friedrichs-Levy (CFL) condition: 

(fc ma xcAt ) 2 < i (3 9) 

where A: max is the highest spatial frequency occur- 
ring in the simulation. 

Once the fields are known on the spatial grid, the 
forces on the particles may be calculated by inter- 
polation. The relativistic equation of motion is 


electric field only for half a time step, then a rota- 
tion of the resulting intermediate momentum vector 
by the magnetic field is carried out and finally the 
second ‘half-push’ by the electric field is performed. 

One of the advantages of applying the full relativis- 
tic treatment is that a top speed for both field and 
particle propagation is provided, and can easily be 
tailored so that the CFL condition (3.9) is obeyed. 

Advancement of the particles’ positions closes the 
main simulation loop, since the charge and current 
densities may be evaluated again. 


4. GRAVITATIONAL SIMULATION 

Particle-mesh simulation is a well-known technique 
in the investigation of the evolution of spiral galaxies 
(Ref. 8 ). Since the algorithms for gravitational sim- 
ulations are so closely related to those for plasma, it 
seems worthwhile to construct them in parallel. The 
main differences between the programs are summa- 
rized here. 

The gravitational potential ^(r) is used instead of 
the fields. It is governed by Poisson’s equation 


VV(r) = 4 tt G p(r) (4.1) 

where p is the mass density and G the gravitational 
constant. Solutions are obtained by convolving the 
Green’s function for this equation, i.e., the single 
particle potential, with the particle number density 
function p/mo. The point particle potential 


4 > = - 


Gmp 

r 


(4.2) 


£ -,E+(-S-)Sii» (3.10) 

at \mocJ 7 

where p is the relativistic momentum, mo is the rest 
mass, and 


7 = 1/yfl — v^/c 2 (3.11) 


is replaced by the ‘soft’ potential 


x _ Grno 


(4.3) 


so that shape factors do not appear explicitly in 
this formulation. The convolution is accomplished 
in Fourier transform space. 


A problem arises because of the appearance of the 
momentum in the vector product in equation (3.9). 
A well-established solution due to Boris (Ref. 14) 
is employed: the momentum is advanced using the 


For modeling isolated galactic systems, free-space 
boundary conditions are imitated by confining the 
particles to 1/4 of the grid, a square of sides L, and 
truncating <j> p beyond x = ±L / 2 and y = ±L / 2 
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(Ref. 8). Although periodic images of the simu- 
lation plane are still produced by the fast Fourier 
transform, they can no longer influence each other 
across the empty buffer zone. 


5. IMPLEMENTATION ON THE MPP 

It will be seen from the above description that there 
are two main areas of the simulation algorithm with 
a high degree of parallelism: handling the fields 
and densities on the spatial grid, and processing 
the large number of particles moving though the 
space. This dichotomy appears in the data struc- 
ture adopted for the simulation. 

In the simplest approach, a spatial grid of 128 X 128 
cells is mapped directly onto the two-dimensional 
array of processing elements (PE’s), as illustrated 
in Figure 1. The local memory holds p, J, Ej, E t 
and B associated with the corresponding grid point. 
The contents of the array unit (ARU) memory may 
be shifted across the network and wrapped around 
at the edges; extensive use is made of this feature in 
the fast Fourier transform for solving the field equa- 
tions. Because it is a global method, the FFT has 
the disadvantage of requiring communication be- 
tween distant processors in the network. An attrac- 
tive alternative is to employ local methods of field 
solution, in which each processor would demand 
data from its nearest neighbours only (Ref. 15). 
However, the FFT is retained for the present as a 
useful general tool, since its numerical stability is 
well understood, and also it provides standard di- 
agnostics. 

The particle data structure presents a more inter- 
esting problem. Provision must be made for the 
information associated with the particles to move 
across the network in order to interact with appro- 
priate data on the grid. Two alternative schemes, 
which may be used in some combination, exist. 

In one disposition, information for a given particle is 
stored in the memory location corresponding to its 
position on the grid (Figure 1). Local calculations 
of densities and forces may then be made without 
communication beween processors. However, the 
packet of information must be moved when the par- 
ticle travels to a new cell in the grid. This method 
has the disadvantages that memory overflow may 
occur in some locations, and that the workload is 
not distributed evenly among the processors unless 
the particle density is more or less uniform. 


ARU memory 


particle 

planes 

global 

variables 

simulation 

grid 



Figure 1. Mapping the particle data to memory. 
Each local memory contains data for particles in 
the corresponding cell in the simulation plane. 


The alternative is to store particles in a uniform way 
in memory, in arbitrary locations. Particle and/or 
field data must then be transported across the net- 
work to a common processor when calculations in- 
volve both types. This scheme should prove useful 
when large density variations exist or when particles 
move though several cells per time step; Hoshino 
and Takenouchi (Ref. 16) proved a variation of it 
to be the more efficient scheme in a particle- particle 
molecular dynamics model on the PAX parallel pro- 
cessor. 

In the case of an initially homogeneous plasma, how- 
ever, large density differences are not produced ex- 
cept in extreme circumstances, and particles may 
be restricted from traveling more than one cell in a 
time step by setting the speed of light to this value. 
For our present purposes, therefore, we decided to 
use the first of the two schemes outlined above. The 
scheme also has the attraction of being a ‘natural’ 
mapping, and for this reason a variant of it was 
used in a 3-dimensional particle-mesh simulation of 
galactic gas dynamics on the ICL DAP by Johns 
and Nelson (Ref. 17). 
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Figure 2 shows a flow chart outlining the algorithm. 
Calculations are carried out in two major proce- 
dures, titled ‘Particles’ and ‘Fields’. The only global 
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Figure 2. Flow chart for the plasma simulation al- 
gorithm. 


parallel arrays that have to be carried between these 
are the variables required by both: p and J (the 
output of ‘Particles’ and the input of ‘Fields’), and 
E/, E t and B (the output of ‘Fields’ and the input 
of ‘Particles’). These data are stored in the ARU 
throughout the main time loop. 

Particle data exist in the ARU only as local vari- 
ables in the procedure ‘Particles’, and are stored 
in the staging memory (SM) during the rest of the 
execution. The procedure contains a loop which is 
executed once for each plane of particle data in the 
SM, as follows. A ‘particle plane’ is moved from 
the SM to the ARU. Values of the magnetic field 
and the total electric field are interpolated in each 


cell at the particle positions (x, y). (Operations are 
masked out in cells containing no particles). New 
velocity components are obtained by applying the 
Boris push, and the position is updated with the 
new velocity. The new co-ordinates (x, y) are tested 
to see whether the particle has moved to a new cell 
on the simulation grid, and a moving direction D 
assigned accordingly. The speed of light is normal- 
ized to one cell per timestep, so that no particle 
can move further than to one of its eight nearest 
neighbours; hence there are nine possible values of 
D, including the null move. 

An inner loop is then carried out for each direction. 
The particles with appropriate D are copied into a 
workplane which is shifted to the new location. The 
densities p and J are updated with the new informa- 
tion. Distribution functions are accumulated when 
needed via a cascade sum. The workplane contents 
are then copied to the top of the stack of particles 
already moved. If full planes exist in the stack, they 
are sent to the SM. The next SM plane is then re- 
trieved. After the last plane has been processed, the 
remaining particle planes are written out to the SM 
to clear the ARU of particle data. 

In a fully electromagnetic program, the field solver 
requires nearly all the available ARU space. The 1- 
dimensional FFT that we use is based on a straight- 
forward Cooley- Tukey algorithm (Ref. 18), and op- 
erates on 128 elements along either rows or columns. 

Equations 3.5 to 3.7 are solved in transform space 
using the E /, E t , and B values from the previous 
time step. The shape factor and coefficients involv- 
ing k are recalculated at each time step to minimize 
the space required for global variables. Transverse 
fields are truncated for k greater than some value 
specified by the user to meet the CFL condition 3.9. 
The spatial variation of one component of the field 
of a single particle is shown in Figure 3. 

Particle initialization is accomplished in ‘Particles’ 
by setting up co-ordinates in the ARU plane by 
plane. A random number generator based on a ran- 
dom bit-plane generator developed at NASA God- 
dard Space Flight Center is used to initialize veloc- 
ities. Up to twelve uniformly distributed random 
numbers are added to produce a distribution func- 
tion closely approximating a Gaussian. Field ini- 
tialization takes place through the inclusion of ‘old’ 
fields in Maxwell’s equations: external fields are in- 
troduced in this way. 








Figure 3. Spatial variation of the rr-component of the longitudinal electric field due to a charge with radius 
1 grid cell placed at the center of the grid. Units are arbitrary. 


6. STATUS OF THE CODE 

Considerable time has been spent in developing ba- 
sic tools such as the FFT, the random number gen- 
erator, input/output routines, etc. These are now 
in working order. The particle memory manager is 
complete and is being tested with the particle ini- 
tialization. The calculation section of ‘Particles’, 
which contains the interpolation, the Boris push, 
the distribution function calculation and the density 
accumulation, is in the final stages of development. 
The procedure ‘Fields’ awaits only the debugging of 
the transverse field section; preliminary results have 
been shown in Figure 3. Thus, as we go to press, 
the code appears to be close to final assembly. 

7. CONCLUSIONS 

Since we have not yet performed a complete sim- 
ulation on the MPP, it would be premature to 
try to give any firm answers to the four questions 
raised in Section 1. Nevertheless, we feel that suffi- 
cient progress has been made to allow us to express 
opinions as to what the answers are likely to be. 


For instance, it is already clear that the two physics 
problems we have chosen, one in plasma physics and 
the other in stellar dynamics, can be simulated on 
the MPP without posing any difficulties of princi- 
ple. The main practical difficulty is likely to be that 
of load balancing, i.e., of ensuring that the com- 
putational work load is shared evenly between all 
the processors in the array. In our present com- 
putational scheme, where the simulation domain 
is mapped directly onto the processor array, load 
balancing becomes difficult whenever the physical 
medium is inhomogeneous. Thus our stellar dy- 
namics problem is more difficult, in this respect, 
than our problem in plasma physics, as may be seen 
from their descriptions in Section 2. Other plasma 
physics problems, however, such as those involving 
magnetically confined plasmas, would also be sub- 
ject to this difficulty, which we perceive as the main 
one to be overcome in order that the potential of 
the MPP for particle simulation of rarefied media 
may be fully realized. The foregoing remarks relate 
to the first two questions raised in Section 1. 

As regards the third question, we can already sug- 
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gest several features that might be incorporated into 
a second- generation MPP, so as to improve its capa- 
bility for particle simulation. There is a clear need 
for a larger random-access memory (RAM) associ- 
ated with each processing element (PE), particu- 
larly if some future simulations are to be made in 
3-D. At present, with only 1 Kbit of RAM per PE, 
we would be restricted to 2-D simulations with fewer 
than 10 particles per spatial cell, were it not that we 
are also using the staging memory for storing data 
on the positions and velocities of the particles; how- 
ever, the use of the stager for this purpose entails 
a speed penalty. Another measure that might be 
taken to increase the computing speed is to intro- 
duce some degree of parallelism into the PE arith- 
metic; the present capability of varying the word 
length by 1-bit increments would probably have to 
be sacrificed, but this feature is less important in 
particle simulation than it is in image processing. 

On the fourth and last question, as to the rela- 
tive merits of the MPP architecture versus possible 
alternative architectures in the application to rar- 
efied media simulation, we are now doubtful as to 
whether whether our present research program will 
yield a clear-cut answer. At the outset, we felt that 
the very simple way in which our square simula- 
tion domain can be mapped directly onto the square 
array of PE’s made the MPP architecture a natu- 
ral choice, but now we are less sure, in view of the 
acuteness of the load-balancing difficulty mentioned 
above. In any case, a final decision could not be 
taken until after some experimentation with other 
architectures such as the hypercube. 

These considerations govern our plans for the fu- 
ture. In the short term, continuing the present re- 
search program through a second year, we shall en- 
deavor to fully realize our immediate physics and 
computer science objectives by exploiting the pos- 
sibilities of the MPP to the utmost. The physics 
objectives are to achieve realistic 2-D simulations 
of the chosen plasma and gravitational phenomena. 
The computer science objectives are to improve the 
simulation algorithms, notably by solving for the 
fields by direct numerical integration of Maxwell’s 
equations rather than by transform methods. More- 
over, in order to promote load balancing, we shall 
investigate the use of sort/merge routines to man- 
age the particles, permitting the data concerning 
them to be stored in PE’s other than those that 
correspond directly to their spatial positions in the 
simulation domain. Without going so far as to store 
these data in arbitrary locations, we feel that it may 


be helpful to allow them a certain latitude. This 
more flexible storage scheme should certainly help to 
reduce load unbalance resulting from random fluctu- 
ations of particle concentration in a statistically uni- 
form medium, such as the plasma considered in our 
whistler instability simulation; it is unlikely, how- 
ever, to be able to cope with unbalance resulting 
from large-scale gradients of particle concentration, 
as occur in our gravitational simulation. 

In the long term, more powerful solutions to the 
problem of load balancing will have to be found. 
Probably the sole viable general solution is to di- 
vide up the simulation domain with a non-uniform 
grid, which is then mapped onto the processor array 
either directly, or with a certain degree of latitude 
as described above. If the large-scale distribution of 
particles is likely to change in the course of a sim- 
ulation, then the grid may have to be adaptive as 
well. We hope to take part in these developments, 
which could be tried out on the MPP in its present 
form. Only when this problem has been adequately 
dealt with can a fair comparison be made with other 
architectures. We anticipate that the comparison is 
likely to favor the MPP architecture, and to provide 
strong motivation for the development of a second- 
generation MPP specifically for particle and fluid 
simulation. 
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